126 research outputs found
Structured Bayesian methods for splicing analysis in RNA-seq data
In most eukaryotes, alternative splicing is an important regulatory mechanism of gene
expression that results in a single gene coding for multiple protein isoforms, thus
largely increases the diversity of the proteome. RNA-seq is widely used for genome-wide
splicing isoform quantification, and several effective and powerful methods have
been developed for splicing analysis with RNA-seq data. However, it remains problematic
for genes with low coverages or large number of isoforms. These difficulties
may in principle be ameliorated by exploiting correlations encoded in the structured
data sources.
This thesis contributes to developments of Bayesian methods for splicing analysis
by leveraging additional information in multiple datasets with structured prior distributions.
First, we developed DICEseq, the first isoform quantification method tailored
to time-series RNA-seq experiments. DICEseq explicitly models the correlations between
experiments at different time points to aid the quantification of isoforms across
experiments. Numerical experiments on both simulated and real datasets show that
DICEseq yields more accurate results than state-of-the-art methods, an advantage that
can become considerable at low coverage levels. Furthermore, DICEseq permits to
quantify the trade-off between temporal sampling of RNA and depth of sequencing,
frequently an important choice when planning experiments.
Second, we developed BRIE (Bayesian Regression for Isoform Estimation), a Bayesian
hierarchical model which resolves the difficulties in splicing analysis in single-cell
RNA-seq (scRNA-seq) data by learning an informative prior distribution from
sequence features. This method combines the quantification and imputation for splicing
analysis via a Bayesian way, which is particularly useful in scRNA-seq data due
to its extreme low coverages and high technical noises. We validated BRIE on several
scRNA-seq data sets, showing that BRIE yields reproducible estimates of exon inclusion
ratios in single cells. Third, we provided an effective tool by using Bayes factor
to sensitively detect differential splicing between different single cells. When applying
BRIE to a few real datasets, we found interesting heterogeneity patterns in splicing
events across cell population, for example alternative exons in DNMT3B.
In summary, this thesis proposes structured Bayesian methods to integrate multiple
datasets to improve splicing analysis and study its biological functions
Statistical modeling of isoform splicing dynamics from RNA-seq time series data
Isoform quantification is an important goal of RNA-seq experiments, yet it
remains prob- lematic for genes with low expression or several isoforms. These
difficulties may in principle be ameliorated by exploiting correlated
experimental designs, such as time series or dosage response experiments. Time
series RNA-seq experiments, in particular, are becoming in- creasingly popular,
yet there are no methods that explicitly leverage the experimental design to
improve isoform quantification. Here we present DICEseq, the first isoform
quantification method tailored to correlated RNA-seq experiments. DICEseq
explicitly models the corre- lations between different RNA-seq experiments to
aid the quantification of isoforms across experiments. Numerical experiments on
simulated data sets show that DICEseq yields more accurate results than
state-of-the-art methods, an advantage that can become considerable at low
coverage levels. On real data sets, our results show that DICEseq provides
substan- tially more reproducible and robust quantifications, increasing the
correlation of estimates from replicate data sets by up to 10% on genes with
low or moderate expression levels (bot- tom third of all genes). Furthermore,
DICEseq permits to quantify the trade-off between temporal sampling of RNA and
depth of sequencing, frequently an important choice when planning experiments.
Our results have strong implications for the design of RNA-seq ex- periments,
and offer a novel tool for improved analysis of such data sets. Python code is
freely available at http://diceseq.sf.net
Recommended from our members
Vireo: Bayesian demultiplexing of pooled single-cell RNA-seq data without genotype reference.
Multiplexed single-cell RNA-seq analysis of multiple samples using pooling is a promising experimental design, offering increased throughput while allowing to overcome batch variation. To reconstruct the sample identify of each cell, genetic variants that segregate between the samples in the pool have been proposed as natural barcode for cell demultiplexing. Existing demultiplexing strategies rely on availability of complete genotype data from the pooled samples, which limits the applicability of such methods, in particular when genetic variation is not the primary object of study. To address this, we here present Vireo, a computationally efficient Bayesian model to demultiplex single-cell data from pooled experimental designs. Uniquely, our model can be applied in settings when only partial or no genotype information is available. Using pools based on synthetic mixtures and results on real data, we demonstrate the robustness of Vireo and illustrate the utility of multiplexed experimental designs for common expression analyses
Screening Spin Lattice Interaction Using Deep Learning Approach
Atomic simulations hold significant value in clarifying crucial matters such
as phase transitions and energy transport in materials science. Their success
stems from the presence of potential energy functions capable of accurately
depicting the relationship between system energy and lattice changes. In
magnetic materials, two atomic scale degrees of freedom come into play: the
lattice and the magnetic moment. Nonetheless, precisely portraying the
interaction energy and its impact on lattice and spin-driving forces, such as
atomic force and magnetic torque, remains a formidable task in the
computational domain. Consequently, there is no atomic-scale approach capable
of elucidating the evolution of lattice and spin at the same time in magnetic
materials. Addressing this knowledge deficit, we present DeepSPIN, a versatile
approach that generates high-precision predictive models of energy, atomic
forces, and magnetic torque in magnetic systems. This is achieved by
integrating first-principles calculations of magnetic excited states with
advanced deep learning techniques via active learning. We thoroughly explore
the methodology, accuracy, and scalability of our proposed model in this paper.
Our technique adeptly connects first-principles computations and atomic-scale
simulations of magnetic materials. This synergy presents opportunities to
utilize these calculations in devising and tackling theoretical and practical
obstacles concerning magnetic materials.Comment: 8 pages, 4 figure
Transcriptome-wide RNA processing kinetics revealed using extremely short 4tU labeling
Background:
RNA levels detected at steady state are the consequence of multiple dynamic processes within the cell. In addition to synthesis and decay, transcripts undergo processing. Metabolic tagging with a nucleotide analog is one way of determining the relative contributions of synthesis, decay and conversion processes globally.
Results:
By improving 4-thiouracil labeling of RNA in Saccharomyces cerevisiae we were able to isolate RNA produced during as little as 1 minute, allowing the detection of nascent pervasive transcription. Nascent RNA labeled for 1.5, 2.5 or 5 minutes was isolated and analyzed by reverse transcriptase-quantitative polymerase chain reaction and RNA sequencing. High kinetic resolution enabled detection and analysis of short-lived non-coding RNAs as well as intron-containing pre-mRNAs in wild-type yeast. From these data we measured the relative stability of pre-mRNA species with different high turnover rates and investigated potential correlations with sequence features.
Conclusions:
Our analysis of non-coding RNAs reveals a highly significant association between non-coding RNA stability, transcript length and predicted secondary structure. Our quantitative analysis of the kinetics of pre-mRNA splicing in yeast reveals that ribosomal protein transcripts are more efficiently spliced if they contain intron secondary structures that are predicted to be less stable. These data, in combination with previous results, indicate that there is an optimal range of stability of intron secondary structures that allows for rapid splicing
CRISPR/Cas9‐mediated somatic correction of a novel coagulator factor IX gene mutation ameliorates hemophilia in mouse
The X‐linked genetic bleeding disorder caused by deficiency of coagulator factor IX, hemophilia B, is a disease ideally suited for gene therapy with genome editing technology. Here, we identify a family with hemophilia B carrying a novel mutation, Y371D, in the human F9 gene. The CRISPR/Cas9 system was used to generate distinct genetically modified mouse models and confirmed that the novel Y371D mutation resulted in a more severe hemophilia B phenotype than the previously identified Y371S mutation. To develop therapeutic strategies targeting this mutation, we subsequently compared naked DNA constructs versus adenoviral vectors to deliver Cas9 components targeting the F9 Y371D mutation in adult mice. After treatment, hemophilia B mice receiving naked DNA constructs exhibited correction of over 0.56% of F9 alleles in hepatocytes, which was sufficient to restore hemostasis. In contrast, the adenoviral delivery system resulted in a higher corrective efficiency but no therapeutic effects due to severe hepatic toxicity. Our studies suggest that CRISPR/Cas‐mediated in situ genome editing could be a feasible therapeutic strategy for human hereditary diseases, although an efficient and clinically relevant delivery system is required for further clinical studies
BRIE: transcriptome-wide splicing quantication in single cells
Abstract Single-cell RNA-seq (scRNA-seq) provides a comprehensive measurement of stochasticity in transcription, but the limitations of the technology have prevented its application to dissect variability in RNA processing events such as splicing. Here, we present BRIE (Bayesian regression for isoform estimation), a Bayesian hierarchical model that resolves these problems by learning an informative prior distribution from sequence features. We show that BRIE yields reproducible estimates of exon inclusion ratios in single cells and provides an effective tool for differential isoform quantification between scRNA-seq data sets. BRIE, therefore, expands the scope of scRNA-seq experiments to probe the stochasticity of RNA processing
Hyperoxemia and excess oxygen use in early acute respiratory distress syndrome : Insights from the LUNG SAFE study
Publisher Copyright: © 2020 The Author(s). Copyright: Copyright 2020 Elsevier B.V., All rights reserved.Background: Concerns exist regarding the prevalence and impact of unnecessary oxygen use in patients with acute respiratory distress syndrome (ARDS). We examined this issue in patients with ARDS enrolled in the Large observational study to UNderstand the Global impact of Severe Acute respiratory FailurE (LUNG SAFE) study. Methods: In this secondary analysis of the LUNG SAFE study, we wished to determine the prevalence and the outcomes associated with hyperoxemia on day 1, sustained hyperoxemia, and excessive oxygen use in patients with early ARDS. Patients who fulfilled criteria of ARDS on day 1 and day 2 of acute hypoxemic respiratory failure were categorized based on the presence of hyperoxemia (PaO2 > 100 mmHg) on day 1, sustained (i.e., present on day 1 and day 2) hyperoxemia, or excessive oxygen use (FIO2 ≥ 0.60 during hyperoxemia). Results: Of 2005 patients that met the inclusion criteria, 131 (6.5%) were hypoxemic (PaO2 < 55 mmHg), 607 (30%) had hyperoxemia on day 1, and 250 (12%) had sustained hyperoxemia. Excess FIO2 use occurred in 400 (66%) out of 607 patients with hyperoxemia. Excess FIO2 use decreased from day 1 to day 2 of ARDS, with most hyperoxemic patients on day 2 receiving relatively low FIO2. Multivariate analyses found no independent relationship between day 1 hyperoxemia, sustained hyperoxemia, or excess FIO2 use and adverse clinical outcomes. Mortality was 42% in patients with excess FIO2 use, compared to 39% in a propensity-matched sample of normoxemic (PaO2 55-100 mmHg) patients (P = 0.47). Conclusions: Hyperoxemia and excess oxygen use are both prevalent in early ARDS but are most often non-sustained. No relationship was found between hyperoxemia or excessive oxygen use and patient outcome in this cohort. Trial registration: LUNG-SAFE is registered with ClinicalTrials.gov, NCT02010073publishersversionPeer reviewe
- …